A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM

نویسندگان

چکیده

Abstract The massive influx of text, images, and videos to the internet has recently increased challenge computer vision-based tasks in big data. Integrating visual data with natural language generate video explanations been a for decades. However, recent experiments on image/video captioning that employ Long-Short-Term-Memory (LSTM) have piqued interest researchers studying its possible application captioning. proposed architecture combines bidirectional multilayer LSTM (BiLSTM) encoder unidirectional decoder. innovative also considers temporal relations when creating superior global representations. In contrast majority prior work, most relevant features are selected utilized specifically purposes. Existing methods utilize single-layer attention mechanism linking input phrase meaning. This approach employs LSTMs extract characteristics from movies, construct links between multi-modal (words material) representations, sentences rich semantic coherence. addition, we evaluated performance suggested system using benchmark dataset obtained results reveal relative state-of-the-art works METEOR promising BLEU score. terms quantitative performance, outperforms existing methodologies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disfluency Detection Using a Bidirectional LSTM

We introduce a new approach for disfluency detection using a Bidirectional Long-Short Term Memory neural network (BLSTM). In addition to the word sequence, the model takes as input pattern match features that were developed to reduce sensitivity to vocabulary size in training, which lead to improved performance over the word sequence alone. The BLSTM takes advantage of explicit repair states in...

متن کامل

Word Sense Disambiguation using a Bidirectional LSTM

In this paper we present a model that leverages a bidirectional long short-term memory network to learn word sense disambiguation directly from data. The approach is end-to-end trainable and makes effective use of word order. Further, to improve the robustness of the model we introduce dropword, a regularization technique that randomly removes words from the text. The model is evaluated on two ...

متن کامل

A Novel Design of a Multi-layer 2:4 Decoder using Quantum- Dot Cellular Automata

The quantum-dot cellular automata (QCA) is considered as an alternative tocomplementary metal oxide semiconductor (CMOS) technology based on physicalphenomena like Coulomb interaction to overcome the physical limitations of thistechnology. The decoder is one of the important components in digital circuits, whichcan be used in more comprehensive circuits such as full adde...

متن کامل

Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection

As humans we possess an intuitive ability for navigation which we master through years of practice; however existing approaches to model this trait for diverse tasks including monitoring pedestrian flow and detecting abnormal events have been limited by using a variety of hand-crafted features. Recent research in the area of deeplearning has demonstrated the power of learning features directly ...

متن کامل

Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention

In this paper, we proposed a sentence encoding-based model for recognizing text entailment. In our approach, the encoding of sentence is a two-stage process. Firstly, average pooling was used over word-level bidirectional LSTM (biLSTM) to generate a firststage sentence representation. Secondly, attention mechanism was employed to replace average pooling on the same sentence for better represent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Big Data

سال: 2022

ISSN: ['2196-1115']

DOI: https://doi.org/10.1186/s40537-022-00664-6